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One in six cancers worldwide is caused by infection and human papillomavirus (HPV) is one of the main culprits. To 
better understand the dynamics of HPV integration and its effect on both the viral and host methylomes, we conducted 
whole-genome DNA methylation analysis using MeDIP-seq of HPV+ and HPV- head and neck squamous cell carcinoma 
(HNSCC). We determined the viral subtype to be HPV-16 in all cases and show that HPV-16 integrates into the host 
genome at multiple random sites and that this process predominantly involves the transcriptional repressor gene (E2) in 
the viral genome. Comparative analysis identified 453 (FDR < 0.01) differentially methylated regions (DMRs) in the HPV+ 
host methylome. Bioinformatics characterization of these DMRs confirmed the previously reported cadherin genes to 
be affected but also revealed new targets for HPV-mediated methylation changes at regions not covered by array-based 
platforms, including the recently identified super-enhancers. 



Introduction 

One in six cancers worldwide is believed to be caused by infection, 
with human papillomavirus (HPV) being one of the leading cul- 
prits. 1 HPV is known to represent a major independent risk fac- 
tor for Head and Neck Squamous Cell Carcinoma (HNSCC). 2,3 
It is particularly associated with oropharyngeal carcinoma, of 
which 20-50% test positive for the HPV-16 subtype, showing 
expression of the E6 and E7 viral oncogenes. 2 " 4 HPV infected and 
uninfected (hereafter referred to as HPV+ and HPV-) HNSCCs 
display differences in clinical behavior: Although HPV+ cancer 
shows an increased propensity to metastasize, HPV infection 
is generally associated with a more favorable clinical outcome; 
patients with HPV+ HNSCC respond better to chemotherapy 
and radiotherapy (82% vs. 55% response rate for HPV- tumors) 
and have higher overall survival rates (95% vs. 62% at two years 
after diagnosis). 5 The cause for this difference between HPV+ 
and HPV- tumors remains unknown. 

Upon persistent infection, HPV can integrate into the host cell 
genome, where it becomes subject to epigenetic modification. 6,7 
The HPV genome does not encode for any DNA methyltrans- 
ferases (DNMT) and thus it is believed that the viral genome 
is methylated by DNMTs of the human host cell. 6 This may be 
directed by the virus in order to regulate intrinsic cell-differen- 
tiation dependent and temporal viral gene expression through 



cross-talk between viral proteins and DNMTs. Alternatively, 
methylation of viral DNA may represent a potential host defense 
mechanism, with the host aiming to alter viral gene expression 
once the virus has integrated into the host genome. 

In cervical epithelium, HPV-DNA hypermethylation is more 
closely associated with cervical carcinoma than with asymptom- 
atic infection or with dysplasia. 7,8 Methylation of the LI gene of 
both HPV-16 and HPV-18 has been reported in cervical cancer 
and in penile cancer. 8 " 11 Methylation changes within HPV-16 
in HNSCC have been demonstrated by Balderas-Loaeza et al. 12 
They reported methylation of the viral genome near the 3' end of 
the LI ORF (3 CpGs) and in the Long Control Region (LCR), 
which regulates the expression of the viral oncoproteins E6 and 
E7 (5 CpGs). The most extensive study of methylation of the 
HPV-16 genome in HNSCC to date used bisulfite-sequencing 
(BS-Seq) to determine the methylation status of all 110 CpG 
sites within the viral genome in advanced stage HPV+ HNSCC. 
Contrary to previous findings, methylation within the viral LCR 
was not observed in most cases, and methylation in regions other 
than the LCR was detected to varying degrees. 13 

In this study, we show the potential for identifying both the 
presence of HPV and the localization of HPV integration sites 
from MeDIP-seq data. Additionally, we utilize the MeDIP-seq 
methylomes to identify and characterize regions of differential 
methylation between HPV+ and HPV- cohorts. 
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Results 

HPV type 16 was confirmed in all HPV+ samples and meth- 
ylation within the viral genome was demonstrated. MeDIP-seq 
data from 3 fresh-frozen (FF) HPV- (HN29, HN32, and HN96) 
and 3 FF HPV+ (HN39, HN105, and HN125) HNSCCs were 
analyzed using the MeDUSA computational pipeline 14 (Table 
SI). We attempted to utilize the MeDIP-seq data set to confirm 
the presence of HPV in the HPV+ samples and the absence of 
HPV in the HPV- samples. 

For each sample, those sequence reads that failed to align 
to the human reference genome were aligned against a viral 
"metagenome" containing 1776 viral genomes obtained from 
NCBI Genomes, including 29 human papillomavirus strains 
(Table S2). All HPV+ HNSCC samples were found to con- 
tain sequence that aligned to HPV type 16 but not other HPV 
strains. As expected, there were no matches to any HPV type in 
HPV- samples. 

To aid comparison between samples, reads that aligned to the 
viral genome were normalized according to average HPV copy 
number per cell (Sample HN39 = 96.94 copies per cell, HN105 = 
110.44 copies per cell, HN125 = 0.66 copies per cell) and further 
normalized to reads per million (RPM). It should be noted that 
the measured viral load does not account for potential differences 
in the ratio of integrated to episomal viral DNA between samples. 

The most pronounced methylation signal was observed at the 
boundary of the LI and the L2 gene and within the El gene 



(Fig. 1). Methylation in the boundary region was detected in 
all samples investigated. The sequence of this part of the viral 
genome harbors six CpG sites at genomic position 5600, 5606, 
5609, 5615, 5707, and 5724. The first four CpG sites are located 
within a sequence of only 16 bp and the remaining two follow 
92 bp and 109 bp downstream, respectively. 

Validation of methylation at the boundary of the L1/L2 
ORF of the viral genome in fresh-frozen HNSCC samples and 
in an independent set of HPV+ HNSCC cell lines by BS-Seq. 
The results obtained by MeDIP-Seq were validated by applying 
the targeted BS-Seq technique (see Methods) to a 421 bp region 
of the viral genome, covering the six CpG sites of interest and two 
additional CpG sites in genomic positions 5925 and 5961, in the 
three FF HPV+ HNSCC samples. BS-Seq was also used to test 
the methylation status of this region in three HPV+ HNSCC cell 
lines (UPCI:SCC90, UM:SCC47, 93VU-147T). Methylation at 
all six CpG sites at genomic positions 5600, 5606, 5609, 5615, 
5707, and 5724 was observed (Fig. SI). 

Integration sites identified within the host genome. Next, 
we selected for paired-end reads in which one aligned with high 
confidence to HPV type 16 and the other to human DNA; these 
constitute potential "integration site" reads (n = 33). While the 
integration sites across the human genome appear to be ran- 
dom, there is a significant enrichment of potential integration 
sites within the viral E2 (E4 and E5) region, as illustrated in 
Figures 2 and 3. 

Genome-wide methylome analysis of HPV+ and HPV- 
HNSCCs. MeDUSA was used to locate differentially methyl- 
ated regions (DMRs) between the HPV+ and HPV- cohorts. 
This analysis led to the identification of 1168 DMRs between 
the HPV+ and HPV- cohort at a false-discovery rate of 0.01. 
An amount of 782 of these DMRs were found to be enriched 
or hypermethylated in the HPV+ cohort relative to the HPV- 
cohort. The remaining 386 showed reduced signal in the HPV+ 
relative to HPV- cohort; we refer to this as hypomethylation. 
A substantial number of hypermethylated DMRs were found to 
reside on the X chromosome. This signal was clearly driven by 
the single sample derived from a female patient (HN125) (Fig. 
S2). As a result, DMRs on the sex chromosome were removed 
and recalled using only the five samples derived from males. This 
resulted in the total DMR count falling to 830 (439 hypermeth- 
ylated and 391 hypomethylated DMRs). 

Copy number analysis was performed on the six samples 
using the Illumina HumanOmnil-Quad BeadChip genotyp- 
ing array. These data revealed that potential copy number vari- 
able (CNV) regions from the six samples cover 64% of the 
genome. "False-enriched CNV regions" were defined as regions 
that show a gain in copy number, defined as a score of > 2.3, 
in the HPV+ cohort or a loss of copy number, defined as a 
score of < 1.7, in the HPV- cohort. Conversely, "false-reduced 
CNV regions" were defined as showing a loss in copy number, 
defined as a score of < 1.7, in the HPV+ cohort, or a gain in 
copy number, defined as a score of > 2.3, in the HPV- cohort. 
Significant association of hypermethylated DMRs with false- 
enriched CNV regions (P < 0.001, permutation test) but not 
with false-reduced CNV regions (P = 0.976, permutation test) 



Figure 1. HPV type 16 methylome in HNSCC. The blue peaks represent 
regions of the viral genome in which methylation was detected using 
MeDIP-seq. Methylation was measured as reads per million per HPV 
copy per cell. Sites of methylation within the viral genome were mainly 
detected at the boundary of the L1/L2 ORF and within the El ORF (cre- 
ated in Circos 39 ). 
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Figure 2. Potential integration sites of HPV type 16 in the human genome and illustration of location of "integration site reads" in the HPV genome, 
representing the genomic region of the virus disrupted during the integration process. Links colored dark blue indicate those located in the HPV 
region most significantly enriched for integration sites. HPV and human genome drawn at different scale (created in Circos 39 ). 



was observed. Likewise, hypomethylated DMRs were associ- 
ated with false-reduced CNV regions (P < 0.001, permutation 
test) but not with false-enriched CNV regions (P = 0.996, per- 
mutation test). Therefore, hypermethylated DMRs falling into 
"false-enriched CNV regions" and hypomethylated DMRs fall- 
ing into "false-reduced CNV regions" were removed, yielding 
a final set of 209 hypermethylated and 244 hypomethylated 
DMRs. 



These DMRs were compared with previously published data 
from Lechner et al. 15 This data set was comprised of 20 HPV+ and 
20 HPV- samples derived from formalin-fixed paraffin-embed- 
ded (FFPE) tissue, with the analysis performed on the Illumina 
Infinium 450K platform. A significant trend (P = < 0.001, K-S 
test) is found whereby the fold change in probes located within 
a DMR between HPV+ and HPV- is in concordance with the 
directionality of the MeDIP-seq DMR (Fig. 4A). This trend 
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Figure 3. Enrichment of potential integration sites in the HPV genome as measured by -log transformed empirical P-value. Region containing the E3, 
E4, and E5 genes display the most significant degree of enrichment. 



becomes stronger if only DMRs that contain > 5 probes are 
included (Fig. 4B). In reality, the majority of the MeDIP-seq 
DMRs (71% [149/209] hypermethylated and 64% [155/244] 
hypomethylated) actually contain fewer than 5 Infinium probes. 
Of these DMRs, however, 98% contained > 5 CpGs. This 
suggests that there are potentially interesting methylation vari- 
able regions, containing numerous CpG sites, absent from the 
Infinium array. Such regions can currently only be detected by 
genome-wide techniques such as MeDIP-seq. 

Utilizing our DMR set, we sought to determine significant 
associations with different genomic features. Both hyper- and 
hypo-methylated DMRs were strongly associated (P value < 
0.001, permutation test) with CpG islands, CpG island shores, 
promoter regions and genes (Fig. S3A). Within genes, hypometh- 
ylated DMRs (P = 0.025 FDR-adjusted, 1-tailed Fisher's Exact 
Test), but not hypermethylated DMRs (P = 0.83), showed a pref- 
erence for coding genes (Fig. S3C, left panel). Additionally, there 
was strong association with exons (P = 2.2 x 10" 28 and p = 1.6 x 
10~ 59 for hyper- and hypomethylated DMRs respectively, FDR- 
adjusted, 1-tailed Fisher's Exact test) but not with introns (P = 
1) (Fig. S3C, right panel). Within CpG islands, we observed a 
strong association with promoter CpG islands (P = 1.59 x 10" 9 
for hypermethylated DMRs, P= 6.13 x 10" 15 for hypomethylated 
DMRs, both FDR-adjusted, 1-tailed Fisher's Exact test) but not 
3'-, intergenic, or intragenic CpG islands (Fig. 3SB). 

To determine whether the DMRs were associated with par- 
ticular genomic regulatory features, the Ensembl ENCODE reg- 
ulatory segmentation 16 was utilized. Using the output from the 
programs chromHMM 17 and Segway, 18 the segmentation classi- 
fies the genome into regions of similar signal over 14 assays to 
obtain a summary of the functional architecture of the human 
genome. Of the six cell lines for which this segmentation data 
are available, we selected the HepG2 cell line as being the most 
similar to oropharyngeal HNSCC. As expected, we discovered 
significant association with predicted promoter regions in both 
hyper- and hypomethylated DMRs (P < 0.001 Fisher's Exact 
test, FDR- adjusted). Additionally, there was a highly significant 
association (P < 0.001 Fisher's Exact test, FDR adjusted) with 



predicted enhancer regions. This was supported by significant 
association with the ENCODE ChlP-seq data for the histone 
mark H3K4mel (HepG2) (P < 0.001 Fisher's Exact test, FDR- 
adjusted) and H3K27ac (HepG2) (P < 0.001 Fisher's Exact 
test, FDR-adjusted) known to be present at enhancer regions. 19 
Furthermore, utilizing annotations from previously identified 
multiple myeloma super-enhancers, 20 we observe overlaps in 28 
of our DMRs. 

Putative transcription factor binding sites in regions where 
DMRs overlap with CpG islands were identified by both affinity 
prediction using TRAP 21 and association with published ChlP- 
seq data from ENCODE. Interestingly, given their role in HPV 
associated cancers, 22 E2F1 and E2F6 binding sites are signifi- 
cantly enriched in targets of both enriched and reduced DMRs 
(Table S3). Importantly, the sets of target genes of hypermeth- 
ylated and hypomethylated DMRs are non-overlapping (Tables 
S4 and S5). E2F1 and E2F6 themselves do not appear to be dif- 
ferentially methylated in HPV+ with respect to HPV- HNSCC. 
Association of methylation variable positions (MVPs) with E2Fs 
has also been detected in the analysis of 45 OK array data, thus 
reinforcing that the key findings by Lechner et al. 15 can be repli- 
cated with a small data set. 

Gene enrichment analysis using PANTHER 23 revealed that 
promoter-associated hypermethylated DMRs affect genes asso- 
ciated with the biological processes cell-cell adhesion (P = 5.83 
x 10~ 3 , Bonferroni corrected) and ion transport (P = 0.0387, 
Bonferroni corrected). Hypomethylated DMRs found in pro- 
moter regions are associated with cation transport (P = 3.61 x 
10" 2 , Bonferroni corrected). Target genes associated with cell- 
cell adhesion include members of the cadherin family (CDH2, 
CDH4, CDH6, and CDH11). This family was also identified 
as significantly associated with MVPs in the analysis of a much 
larger data set using the 450K array technology. 15 

Discussion 

We have analyzed MeDIP-seq data on 3 HPV- and 3 HPV+ 
HNSCCs to characterize both the viral and host methylomes. 
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Figure 4. Log fold change obtained from comparison 
of FFPE HPV+ (n = 20) and FFPE HPV- (n = 20) probes 
on lllumina Infinium 450K array 15 located within (A) all 
MeDIP-seq DMR regions and (B) MeDIP-seq DMR regions 
containing > 5 probes. 



Given the small sample size, we first attempted to val- 
idate previous findings 15 using our independent data 
set obtained with a different method. Additionally, 
unlike the previous array-based analysis, we were 
able to utilize the same MeDIP-seq data to interro- 
gate the viral methylome and potential sites of viral 
integration. 

Analysis of HPV-induced changes in the tumor 
methylome identified 453 regions of differential 
methylation with high confidence, given a 1% false- 
discovery rate for DMR identification and removal 
of DMRs falling into relevant copy-number variable 
regions. CpG Island regions that are targeted by dif- 
ferential methylation were found to harbor bind- 
ing sites for a range of transcription factors, notably 
E2F1 and E2F6. While the gene sets targeted by 
hypermethylated and hypomethylated DMRs are 
non-redundant, there are target genes within each 
class that harbor both E2F1, E2F6 and other tran- 
scription factor binding sites within the methylated 
region. Within the E2F family, E2F1 is known to 
act as an activator transcription factor to promote 
cell proliferation. 24 E2F6, on the other hand, lacks 
the Cdk- and Rb-interaction domains found in other 
members of the E2F family. 24 Given the resolution 
limit of the MeDIP-seq technique (-200 bp), it is 
not possible to definitively state which binding site is 
affected by methylation in these particular cases. On 
a global scale, however, it is interesting to note that 
many genes targeted by differential methylation are 
involved in cell-cell adhesion processes as supported 
by gene enrichment analysis. This includes members 
of the cadherin family as well as NCAM2. Given the 
known role of these proteins in the epithelial-to-mes- 
enchymal transition and metastasis, 25 it is tempting 
to speculate that epigenetic changes induced by HPV 
infection might underlie the difference in meta- 
static potential between HPV+ and HPV- HNSCC 
observed in the clinic. We conclude that we can rep- 
licate some of the key findings from previous analy- 
sis of the methylomes of HPV+ and HPV- HNSCC 
with MeDIP-seq and a small data set. 

One key advantage of MeDIP-seq over 450K in 
this context is that it allows us to interrogate the 
viral methylome. In line with previous reports show- 
ing that HPV type 16 is found in almost all cases 
of HPV+ HNSCC, 26 all FF HPV+ HNSCC samples 
tested in this study had integrated copies of HPV 
type 16. Specifically, methylation within one region 
of the HPV type 16 genome was detected consistently 
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across all samples. This 124 bp region lies at the boundary of the 
LI and the L2 gene and harbors a total of 6 CpG sites. 

Methylation in this location has previously been shown in 
cervical cancer. 7,8 Fernandez et al. 7 showed that methylated CpG 
sites within this region of both the HPV type 16 and HPV type 
18 genome is present in the majority of cervical cancer samples 
tested, rather than in premalignant cervical lesions and in car- 
riers. As part of their comprehensive HPV type 16 methylome 
analysis Park et al. 13 assessed the methylation status of this region 
in advanced stage III/IV HNSCC patients. Although not spe- 
cifically mentioned by the authors, they found consistent meth- 
ylation of CpGs within this region across every sample tested. 
The main result presented by Park et al. 13 was that large parts 
of the HPV type 16 genome, in particular the LCR, which con- 
trols transcription of the E6 and E7 oncogenes, 27 was found to 
be unmethylated in the majority of examined cases. In line with 
the findings by Park et al. 13 the LCR region is found unmethyl- 
ated in all FF HPV+ HNSCC samples tested in this study, sug- 
gesting that methylation in this genomic location may be a rare 
event in HNSCC. The LCR harbors multiple binding sites for 
E2. Methylation within the LCR is known to prevent the bind- 
ing of E2, thereby increasing the expression of E6 and E7, the two 
key oncogenes. 6 Thus, one would expect to find methylated CpG 
sites within this region. One explanation may be that, as the E2 
gene is often disrupted or lost upon viral integration, the selective 
pressure to methylate this genomic segment is lost in the absence 
of the transcriptional repressor E2. 13 

Based on the findings by Fernandez et al., 7 it is conceivable 
that methylation of the viral genome plays an important role in 
HPV-induced carcinogenesis, rather than a bystander role only. 
Moreover, the boundary of the LI and L2 gene is reported more 
frequently methylated in cervical cancer samples than in prema- 
lignant and normal tissue. 8 Irrespective of the potential effects of 
methylation in this specific location, it may serve as a diagnostic 
biomarker for HPV+ HNSCC. Park et al. 13 employed methyla- 
tion-specific PCR (MSP) to test for LCR methylation status in 
serum and saliva samples of HPV+ HNSCC patients. As no con- 
trol group was included in that study, it can be regarded as a suc- 
cessful feasibility trial. However, based on the findings discussed 
in this section and the results by Brandsma et al. 8 and Fernandez 
et al., 7 testing the methylation status of this region at the bound- 
ary of L1/L2 in serum and saliva samples may represent a far 
more powerful diagnostic biomarker. 

Apart from methylation at the Ll/2 boundary region of the 
viral genome, methylation within the El gene was detected to 
varying degrees in the FF HPV+ HNSCC tissue data set. The 
significance of these findings remains to be investigated. 

A further advantage of having employed MeDIP-Seq is that 
potential integration sites can be detected by selecting for paired- 
end reads that align both to the viral and host genome ("inte- 
gration site reads"). Our data show that integration sites across 
the human genome appear to be random. However, there was a 
significant enrichment of potential "integration site reads" within 
the viral E2 (E4 and E5) region, suggesting that this is a com- 
mon location of disruption of the viral sequence upon integration 
into the host genome in HNSCC. This is in line with integration 



events observed for cervical HPV infection. 28 It is noteworthy 
that the disruption of E2 function (an inhibitor of E6 and E7 
viral gene expression) leads to increased expression of the E6 and 
E7 viral oncogenes and, thus, may represent a key step in viral 
carcinogenesis. 

In conclusion, we show that integrated virus-host methylome 
analysis using MeDIP-seq complements array-based studies for 
the identification of novel targets for differential DNA methyla- 
tion in HPV+ HNSCC. 

Materials and Methods 

Human tissue samples and cell lines used for the experi- 
ments. Ethical approval for the study of human tissue samples 
was granted by the UCL/UCLH Ethics Committee (Reference 
number 04/Q0505/59). Three fresh-frozen (FF) HPV+ and 
three FF HPV- HNSCC samples (Table S6) were obtained 
from the UCLH Head and Neck Tumour Bank and were used 
for MeDIP-Seq analysis. HNSCC cell lines UPCLSCC090 
(HPV+), UPCLSCC003 (HPV-), UPCLSCC036 (HPV-) and 
PCI-30 (HPV—) were generous gifts from Dr Susanne Gollin 
and Dr Theresa Whiteside (University of Pittsburgh Cancer 
Institute, US). 93VU-147T (HPV+) was a generous gift from Dr 
Hans Joenje (VU Medical Center) and UM:SCC047 (HPV+) 
was from Dr Thomas Carey (University of Michigan). All cell 
lines were maintained in Dulbecco's Modified Eagle's Medium 
(DMEM) supplemented with 10% fetal bovine serum (FBS) and 
penicillin/streptomycin. The main features of these cell lines are 
described in Table S7. HPV status was confirmed by E6 qPCR 
from both FF and cell line samples as described previously. 29 

DNA extraction. DNA from FF tumor and cell line sam- 
ples was extracted using the QIAamp DNA Blood Mini Kit 
(QIAGEN). This DNA purification method yields DNA sized 
from 200 bp to 50 kb. Thus, it is possible that episomal viral 
DNA was co-purified together with the genomic DNA that 
included integrated viral DNA. 

MeDIP-Seq data analysis of the HPV genome. DNAs from 
3 HPV+ and 3 HPV- fresh-frozen HNSCC samples were sub- 
jected to AutoMeDIP-seq as previously described, 30 using the 
NEBNext DNA Sample Prep Mastermix (New England Biolabs) 
for the library preparation and the MagMeDIP kit (Diagenode) 
for immunoprecipitation. Adequate enrichment of the methyl- 
ated DNA fraction (compared with input) was quality controlled 
using qPCR. Following adaptor-mediated PCR, the library was 
subjected to size selection (300-350 bp) from low melting-point 
agarose gels. The excised fraction was quality controlled by 
qPCR. Cluster generation and 36 base paired-end sequencing 
was performed by the UCL Genomics Core Facility according to 
the manufacturer's recommendation, using an Illumina GAIIX 
platform. 

The data were analyzed using the MeDUSA pipeline. 14 First, 
paired-end sequencing reads were aligned to the human refer- 
ence genome (hgl9) using BWA (v0.5.8), 31 with default param- 
eters. Next, SAMTools 32 and custom scripts were used to remove 
(1) incorrectly mapped pairs, (2) pairs in which neither read 
obtained a Phred score of alignment quality greater than 10, 
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and (3) duplicate fragments (i.e., fragments of exactly the same 
coordinates), which likely represent artifacts of PCR amplifica- 
tion. Read counts are illustrated in Table SI. Quality control was 
performed using FastQC (vO.9.4, http://www.bioinformatics. 
bbsrc.ac.uk/projects/fastqc/) and MEDIPS (vl.0.0). 33 Next, the 
MultipleReplicaScanSeqs (MRSS) and EnrichedRegionMaker 
functions of the USeq software suite (v6.8) 34 were used to iden- 
tify DMRs between the HPV+ and HPV- cohorts. MRSS pro- 
cesses point data for use in the BioConductor package DESeq. 35 
Window size was set to 500. Only regions containing a mini- 
mum of 10 reads summed from the two cohorts were included 
for DMR analysis. DESeq's variance outlier filter function was 
bypassed. 

DMRs were further filtered to remove those located on the 
sex chromosomes due to the presence of a single female sam- 
ple in the HPV+ cohort. Additionally, hypermethylated DMRs 
falling into "false-enriched CNV regions" and reduced DMRs 
falling into "false-reduced CNV regions" were removed. "False- 
enriched CNV regions" were defined as regions that show a gain 
in copy number, defined as a score of >2.3, in the HPV+ cohort 
or a loss of copy number, defined as a score of <1.7, in the HPV- 
cohort. Conversely, "false-reduced CNV regions" were defined 
as showing a loss in copy number, defined as a score of <1.7, in 
the HPV+ cohort, or a gain in copy number, defined as a score of 
>2.3, in the HPV- cohort. 

CNVs were determined using the HumanOmnil-Quad bead 
chip (Illumina). 1 ug of DNA was utilized and processed accord- 
ing to the manufacturers instructions. Copy number alterations 
were identified using Illumina Genome Studio. The average log 
ratio of the X chromosome in male compared with female was 
used to determine the value of a single copy alteration as 0.3, 
hence a gain was defined as >2.3 and a loss as <1.7. 

DMR annotation. DMRs identified by the MeDUSA 
pipeline at a false-discovery rate of 0.01 were annotated using 
BEDTools. 36 Feature annotation was obtained from Ensembl 
(Build 70) . 16 A DMR was annotated with a feature if a minimum 
51% overlap in one direction, i.e., 51% of the DMR overlap- 
ping the feature or 51% of the feature overlapping the DMR, 
was observed. Statistical testing was performed using either 
Fisher's exact test or a permutation test. For both these tests, all 
regions containing a minimum of 10 reads summed from the 
two cohorts were used as genomic background thus accounting 
for biases in the genomic distribution of MeDIP-seq reads. 

Gene enrichment analysis. Gene enrichment analysis was 
performed on the subset of DMRs falling into promoter regions 
(minimum 51% overlap) against the whole-genome background 
using the web tool PANTHER (v8.0). 23 

Identification of putative transcription factor binding sites. 
Putative transcription factor binding sites in regions where 
DMRs overlap with CpG islands were identified by two meth- 
ods: (1) regions were submitted to the online tool TRAP 21 
for affinity-based prediction of TF binding sites. (2) Overlap 
between DMRs and Encode ChlP-seq data for various DNA 
binding factors was determined (minimum 51% overlap) and 
the significance of association determined using Fisher's Exact 
Test. 



Characterization of the HPV methylome. Reads that failed 
to align to the human genome were mapped, using BWA, 31 to 
1776 viral genomes, including 29 human papillomaviruses (Table 
S2), obtained from NCBI Genomes (http://www.ncbi.nlm. 
nih.gov/genomes/GenomesHome. cgi?taxid = 10239) (accessed 
28/03/2011). 

Filtering was performed using SAMtools (v0.1.9) 32 to remove 
erroneously mapped and low quality (score of <10) reads. In 
order to compare viral aligned reads between samples, the read 
counts were normalized for viral load by dividing the number 
of reads in 10 bp windows by the viral load as estimated by E6 
qPCR. Further normalization to account for total library size led 
to read counts being transformed to reads per million (RPM) as 
described in Chavez et al. 2010. 33 

Additionally, the MeDIP-seq data were analyzed to identify 
potential viral integration sites. For this, reads that failed to form 
a correctly aligned pair in the human genome were stored. For 
each HPV+ sample, these stored reads were aligned to the HPV- 
16 (NC_001526) genome using BWA. Pairs in which one read 
mapped to the viral genome with an alignment score >10 and 
the corresponding mate failed to map, were identified (n = 182). 
These reads were compared with the relevant human alignment 
to find pairs. In order to be classed as a pair, the human read 
also required an alignment score > 0 and to have an unmapped 
mate (n = 660428). Therefore the resulting pair comprises of 
one read aligning with high confidence to HPV-16 and the other 
to human, indicating that a potential integration site is located 
within the fragment (n = 33). 

To determine if there was a significant enrichment of these 
reads within a specific location on the viral genome, counts of the 
aligned '"integration site" reads were obtained for 500 bp win- 
dows with 10 bp slide. Given the low number of "integration 
site" reads, permutation analysis was performed (n = 10000) to 
indicate whether the read counts obtained in the 500 bp win- 
dows were different to that expected by chance. An additional 
factor to take account of was that, by chance, we were more likely 
to find an "integration site" read in regions of the viral genome 
that had more MeDIP reads aligning. Thus the probability of an 
"integration site" read aligning to a specific genomic region was 
weighted according to the total number of reads aligning in that 
window. Permutations were performed, utilizing the probabili- 
ties determined for each window, by placing 33 simulated reads 
across the viral genome. The number falling within each window 
was counted and compared with the real data. This was repeated 
(n = 10 000) to obtain an empirical P value. The resulting empiri- 
cal p-values were log transformed (-loglO) and plotted along the 
HPV16 genome. 

Validation of obtained results by bisulfite sequencing 
(BS-Seq). Generation of methylated and unmethylated control 
samples. In vitro methylation of HPV+ HNSCC DNA samples 
was performed using CpG Methyltransferase (M.Sssi) (New 
England Biolabs Inc.), according to the protocol of the sup- 
plier, in order to obtain a positive control for subsequent BS-Seq 
analysis. 

Whole-genome amplification was performed using the 
REPLI-g Mini Kit (QIAGEN), according to the protocol of 
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the supplier, in order to obtain a negative control for subsequent 
BS-Seq analysis. 

Bisulfite modification. One microgram of genomic DNA 
from 3 FF HPV+ HNSCC tissue samples (analyzed by MeDIP- 
Seq previously) and from 3 HPV+ HNSCC cell lines was bisul- 
fite converted using the Zymo EZ DNA Methylation kit (Zymo 
Research Corp, cat No. D5001). The protocol was modified to 
improve bisulfite conversion efficiency by inclusion of a cyclic 
denaturation step as described previously. 37 Bisulfite conver- 
sion efficiency was confirmed by methylation-specific PCR 
(MSP) for a methylated and unmethylated DNA fragment, as 
described. 37 

Targeted HPV methylome analysis. In order to obtain the DNA 
segment of interest (the region at the boundary of the LI and L2 
gene within the HPV-16 genome), the targeted modified DNA 
fragment was amplified using touchdown PCR with lx Reaction 
Buffer, 0.5 mM dNTPs, 2.0-3.0 mM MgCL,, 0.4 u,M of for- 
ward and reverse primers respectively and 1 unit of HotStar Taq 
(Qiagen), in a total volume of 20 |xl (Qiagen). Reagents were 
denatured at 95 °C for 10 min, followed by 20 cycles of 95 °C for 
45 sec, 60 °C with a decrease of half a degree per cycle for 45 sec, 
68 °C for 60 sec and 30 cycles of 95 °C for 45 sec, 50 °C for 45 
sec, 68 °C for 60 sec and ending with a seven-minute extension 
at 68 °C. One in vitro methylated control sample (positive con- 
trol) and one unmethylated (whole-genome amplified) control 
sample (negative control) were included in each PCR, as previ- 
ously described. 38 PCR primers used for BS-Seq, covering the 6 
CpG sites of interest in a region at the boundary of the LI and L2 
gene within the HPV-16 genome, were used to validate the viral 
methylome analysis employing MeDIP-Seq. Sequences of these 
primers were obtained from 13 and are summarized in Table S8. A 
product of 421 bp (see Fig. S4) was generated by PCR using the 
PCR program outlined above. 

The specificity of the reaction products was inspected using 
1.5% agarose gel electrophoresis before being purified using the 
Illustra ExoStar 1-Step kit (GE Healthcare Life Sciences). 

Bisulfite sequencing. Methylation analysis was performed with 
bisulfite genomic sequencing using the bisulfite modified and 
PCR amplified and purified DNA segment of interest 421 bp seg- 
ment at the border of the LI and L2 gene in the HPV-16 genome, 
harboring the 6 CpG sites of interest in genomic positions 5600, 
5606, 5609, 5615, 5707, and 5724 (and two additional CpG 
sites). The same primers (Table S8) were used for BS-Seq. 

After bisulfite treatment and PCR amplification, a subclon- 
ing step is frequently included before sequencing. This is done 
to avoid a molecular weight shift, detected during the sequenc- 
ing reaction, when CpG rich regions are analyzed. This shift is 
a result of the mixture of methylated (+ CH 3 ) and unmethylated 
(- CH 3 ) CpG sites within the segment of interest in a 



heterogeneous DNA population. In this case however, the seg- 
ment of interest only contained 8 CpG sites (within a 421 bp viral 
DNA segment). Thus, no significant molecular weight shift was 
expected. In order to cause a molecular weight shift of one base, 
the DNA segment would have to contain 22 CpG sites that are 
differentially methylated in the cell population. Hence, a sub- 
cloning step was not required. Although Sanger sequencing is 
not a quantitative method, the results obtained from the positive 
(in vitro methylated) and the negative (whole-genome amplified) 
control, confirms the methylation status of this locus in the HPV 
type 16 genome. 

BS-Seq was performed by the UCL Genomics Core Facility 
according to the manufacturer's recommendation using Applied 
Biosystems Bigdye3.1 chemistry. Sequencing reactions were 
cleaned up using in-house-made sephadex filtration plates and 
then run on Applied Biosystems 3730XL Genetic Analyzer. 
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Note 

MeDIP-Seq data were submitted to GEO (Gene Expression 
Omnibus, NCBI) according to the instructions provided (GEO 
accession number: GSE38263). 
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